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This paper deals with a general class of transformation models 
that contains many important semiparametric regression models as 
special cases. It develops a self-induced smoothing for the maximum 
rank correlation estimator, resulting in simultaneous point and vari- 
ance estimation. The self-induced smoothing does not require band- 
width selection, yet provides the right amount of smoothness so that 
the estimator is asymptotically normal with mean zero (unbiased) 
and variance-covariance matrix consistently estimated by the usual 
sandwich-type estimator. An iterative algorithm is given for the vari- 
ance estimation and shown to numerically converge to a consistent 
limiting variance estimator. The approach is applied to a data set 
involving survival times of primary biliary cirrhosis patients. Simu- 
lations results are reported, showing that the new method performs 
well under a variety of scenarios. 



1. Introduction. Consider the following class of regression models, 
with response variable denoted by Y and ((i-l-l)-dimensional covariate vector 
by X, 

(1) Y = H{X.'P + e) 

where /3 is the unknown parameter vector, e is the unobserved error term 
that is independent of X with a completely unspecified distribution, and H 
is a monotone increasing, but otherwise unspecified function. 

It is easily seen that this class of models contains many commonly used re- 
gression models as its submodels that are especially important in the econo- 
metrics and survival analysis literature. For example, with H{u) = u, (1) 
becomes the standard regression model with an unspecified error distribu- 
tion; with H{u) = (A > 0), the Box-Cox transformation model (Box and 
Cox, 1964); with H{u) = I[u > 0], the binary choice model (Maddala, 1983; 
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McFadden, 1984); with H{u) = ul[u > 0], a censored regression model (To- 
bin, 1958; Powell, 1984); with H{u) = exp{u), the accelerated failure times 
(AFT) model (Cox and Oakes, 1984; Kalbfleisch and Prentice, 2002); with e 
having an extreme value density f{w) = exp(tt; — exp(tt;)), the Cox propor- 
tional hazards regression (Cox, 1972); with e having the standard logistic 
distribution, the proportional odds regression (Bennett, 1983). 

A basic tool for handling model (1) is the maximum rank correlation 
(MRC) estimator proposed in the econometrics literature by Han (1987). 
Because both the transformation function H and the error distribution are 
unspecified, not all components of (3 are identifiable. Without loss of gen- 
erality, we shall assume henceforth that the last component, /3d+i = 1. Let 
(Y'i,Xi), (y„,X„) be a random sample from (1). Han's MRC estimator, 
denoted by 6n, is the maximizer of following objective function 

(2) Qn{e) = , ^ > ^M^if^i^) > ^m]^ 

nin — 1 j ^-^ ■' 

where /[ • ] denotes the indicator function, X' the transpose of X, and Q 
the first d components of /3, i.e. /3(0) = (^i, !)'• Han (1987) proved 
that the MRC estimator 0„ is strongly consistent under certain regularity 
conditions. 

An important subsequent development is due to Sherman (1993), who 
made use of the empirical process theory and Hoeffding's decomposition to 
approximate the objective function, viewed as a U-process. He showed that 
Qn is, in fact, asymptotically normal under additional regularity conditions. 
Estimation of the transformation function H was studied by Chen (2002), 
who constructed a rank-based estimator and established its consistency and 
asymptotic normality. 

In addition to the econometrics, model (1) also encompasses the main 
semiparametric models in survival analysis, where right censoring is a major 
feature. Under the right censorship, there is a censoring variable C and 
one observes Y = Y NC and = l{Yi < d). Khan and Tamer (2007) 
constructed the following partial rank correlation function as an extension 
of the rank correlation objective function (2), 

(3) QUO) = , ^A,/[l- > y,]/K/3(0) > x;./3(0)]. 

n^n I) 

They showed that the resulting maximum partial rank correlation estimate 
(PRCE) as the maximizer of Q*(0), is consistent and asymptotically 
normal. 
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Crucial for the statistical inference of (1) based on 0„ is the consistent 
variance estimation. In standard objective (loss) function derived estimation, 
the asymptotic variance is usually estimated by a sandwich-type estimator 
of form A~^VA~^ with A being the second derivative of the objective func- 
tion and V an estimator of the variance of the first derivative (score). The 
challenge here, however, is that Qn itself is a (discontinuous) step function 
that precludes automatic use of differentiation to obtain A. Furthermore, 
V is also difficult to obtain since the score function cannot be derived di- 
rectly from Qn via differentiation. Sherman(1993) suggested using numerical 
derivatives of first and second orders to construct A and V. His approach 
requires bandwidth selection for the derivative functions. It is unclear how 
stable the resulting variance estimator is. Alternatively, one may resort to 
bootstrap (Efron, 1979) or other resampling methods (e.g. Jin et al., 2001). 
These approaches require repeatedly solving the maximization of (2), which 
is discontinuous and often multidimensional when d> 1. The computational 
cost could therefore be prohibitive. 

In this paper, we develop a self- induced smoothing method for rank cor- 
relation criterion function (2) so that the differentiation can be performed, 
while bypassing the bandwidth selection. Both point and variance estima- 
tors can be obtained simultaneously in a straightforward way that is typ- 
ically used for smooth objective functions. The new method is motivated 
by a novel approach proposed in Brown and Wang (2005, 2007), where an 
elegant self-induced smoothing method was introduced for non-smooth es- 
timating functions. Although our approach bears similarity with that of 
Brown and Wang (2005), it is far from clear why such self-induced smooth- 
ing is suitable for the discrete objective function (rank correlation). In fact, 
undersmoothing would make the Hessian (second derivative) unstable while 
oversmoothing would introduce significant bias. Through highly technical 
and tedious derivations, we will show that the proposed method does strike 
a right balance in terms of asymptotic unbiasedness and enough smoothness 
for differentiation (twice). 

The rest of the paper is organized as follows. In Section 2, the new methods 
are described and related large sample properties are developed. In partic- 
ular, we give construction for simultaneous point and variance estimation 
and show that the resulting point estimator is asymptotically normal and the 
variance estimator is consistent. In Section 3, the approach, along with the 
algorithm and large sample properties, is extended to handle survival data 
with right censoring. Simulation results are reported in Section 4, where ap- 
plication to a real data set is also given. Section 5 contains some concluding 
remarks. Additional technical proofs can be found in the Appendix. 
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2. Main Results. In this section we develop a self- induced smooth- 
ing method for the rank correlation criterion function defined by (2). It is 
divided into three subsections, with the first introducing the method and 
the algorithm, the second establishing large sample properties and the third 
covering proofs. 

2.1. Methods. Since MRC estimator On is asymptotically normal (Sher- 
man, 1993), its difference with the true parameter value, On — 0, should 
approximately be a Gaussian noise Tjl^fn^ where Z ~ A'^(0, Xl) is a d- 
dimensional normal random vector with mean and covariance matrix 
S. Assume that Z is independent of data and let E'z denote the expec- 
tation with respect to Z given data. A self-induced smoothing for Qn is 
Qn{0) = ExQn{0 + 7j/y/n). The self-induced smoothing using the limiting 
Gaussian distribution was originally proposed by Brown and Wang (2005) 
for certain non-smooth estimating functions. 

To get an explicit form for Q„,, let $ be the standard normal distribution 

function, Xjj = X, — Xj, cjjj = \J (X^-j^ )'SXj-j^ where X^j^ denotes the first 
d components of Xjj. Then, it is easy to see that 

(4) Qn{e) = — ^ ^[^^ > ^^l*^ (^/^x^/3(0)M,) . 

We shall use On = argmaxQ (5n(0) to denote the corresponding estima- 
tor, which will be called the smoothed maximum rank correlation estimator 
(SMRCE). Here and in the sequel, denotes the parameter space for 0. 

Remark 1. Smoothing is an appealing way for a simple solution to the 
inference problem associated with the MRCE. If Qn were a usual smooth 
objective function, then its first derivative would become the score function 
and its second derivative could be used for variance estimation. Speficically, 
if we use V to denote the limiting variance of the score scaled by n and A the 
limit of the second derivative, then the asymptotic variance of the resulting 
estimator, scaled by n, should be of form A~^VA"'^. A consistent estima- 
tor could then be obtained by the plug-in method, i.e. replacing unknown 
parameters by their corresponding empirical estimators. 

Remark 2. It is unclear, however, whether or not the self-induced smooth 
will provide a right amount of smoothing, even in view of the results given 
in Brown and Wang (2005). With over-smoothing. On may be asymptoti- 
cally biased, i.e. the bias is not of order o(?i~^/^); with under-smoothing, 
the "score" function (first derivative of Qn) may have multiple "spikes" and 
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thus the second derivative matrix (Hessian) of Q„ may not behave properly 
and certainly cannot be expected to provide a consistent variance estimator. 

In Subsection 2.2, we show that the self- induced smoothing here does 
result in a right amount of smoothing in the sense that the bias is asymp- 
totically negligible and the Hessian matrix behave properly. Before starting 
the theoretic developments, we first describe our method. 

We first differentiate the smoothed objective function Qn to get score 



1 



n{n — 1) 



(1) 



where Hij = sgn{Yi — Yj). This is a U-process of order 2 with kernel 



'V^X^,/3(6>)\ V^X^ 



(1) 



a, 



where Uj denotes the pair (l^,Xj). 

By Hoeffding's decomposition, the asymptotic variance of ^/nSn{0) is 
approximated by 



Hij X 



^X^/3\ V^X, 



(1) 



O", 



i=l I J 

where, for a vector v, v®^ = vv'. Thus, V„(0„,5]) is used to estimate V, 
the middle part of the "sandwich" variance formula discussed in Remark 1. 



As for A, we differentiate 8^(0) to get 



(6) A„(0,S) 



2n(n - 1) 



X (j) 



'V^X^,/3^ 



\/nX, 



(1) 

ij 



where (piz) = —z(j){z) is the derivative of (j){z). Although the self-induced 
smoothing was motivated earlier with I] being the limiting covariance matrix 
of the estimator, we will show later that for any positive definite matrix S, 
A„(0„,5]) converges to A. 

Note that the above discussions about A and V are not mathematically 
rigorous. This is because the kernel function for the score process is sample 
size n-dependent. The usual asymptotic theory for the U-process is not ap- 
plicable. Indeed, our rigorous derivations, to be given in Subsection 2.3, are 
quite tedious, involving many approximations that are quite delicate. 
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Let 

(7) D„(0, S) = A-^O, ^) X V„(0, S) X A-'{e, S). 

If is the true parameter value, then D„(0,S) converges to the Hmiting 
covariance matrix, which is the desired choice for 5] in the self-induced 
smoothing. Therefore, (7) leads to an iterative algorithm of form = 

D„(0„,5]„ ); see also Brown and Wang (2005). Specifically, we propose 
the following algorithm: 

Algorithm 1. (SMRCE) 

1. Compute the MRC estimator 0„ and set Yl^^^ to be the identity matrix. 

2. Update variance-covariance matrix ^ = D„(0„,S^ ■*). Smooth 

- (fc) 

the rank correlation Q„(0) using covariance matrix S„ . Maximize 

" (^) 

the resulting smoothed rank correlation to get an estimator 6^ . 

3. Repeat step 2 until 0„ converge. 

2.2. Large-sample properties. This subsection is devoted to the large 
sample theory. The main results are: 1. the smoothed MRC estimator (SM- 
RCE) is asymptotically equivalent to the MRC estimator; 2. the proposed 
method leads to a consistent variance estimator; and 3. the iterative algo- 
rithm for point and variance estimation converges numerically. 

We first introduce notation as well as assumptions, which are similar to 
those in Sherman (1993) for the MRC estimator. Let 

(8) T{y, yi,e) = E [^[.y>y]^[(x_x)'/3(0)>o] + ^[2/<>']^[(x-x)'/3(0)<o] 

which is the projection of the kernel of U-process Qn{^)- The expectation is 
taken for (X,y). Also let 



|Vm|r(y,x,0) = ^ 



ii,. 



a™r(y,x,0) 



de,, ■ ■ ■ dOi^ 



The following Assumptions 1 and 2 are used in Han (1987) (see also Sher- 
man, 1993) to establish consistency for the MRC estimator. For asymptotic 
normality, we need an additional regularity condition (Assumption 3) given 
in Sherman (1993). 

Assumption 1. The true parameter value Oq is an interior point of 0, 
which is a compact subset of the d-dimensional Euclidean space W^. 
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Assumption 2. The support of X is not contained in any linear sub- 
space of R'^^^. Conditional on the first d components of X, the last compo- 
nent of X has a density function with respect to the Lebesgue measure. 

Assumption 3. There exists a neighborhood, J\f, of 9q such that for 
each pair (y,x) of possible values of (y, X), 

(i) The second derivatives of T{y,x; 6) with respect to exist in M. 

(ii) There is an integrable function Afi(y,x) such that for all 6 in M, 

||V2T(y,x;0) - V2T(y,x;0o)||2 < Mi(y,x)|0 - ^ol- 

(iii) E(|Vi|r(y,X;0o))' <+oo. 

(iv) S|V2|r(y,X;0o) < +oo. 

(v) The matrix EV2t{Y,^;6q) is strictly negative definite. 

Proposition 1. (Sherman, 1993) Assume that Assumptions 1-3 hold. 
We have, uniformly over any Op(l) neighborhood of 6q, 
(9) 

Qn{o)-Qn{eo) = l{e-eoyAo{e-eo)+^{e-eoyWn+Op{\e-eof)+op{-) 



where W„ = ^ Vit(1^,, X,; 6>o), 2A(6>) = ^V2r(y,X;6>) and Aq = 
A(0o)- Consequently, for the MRC estimator On, 

(10) - Oo) = ^0 + ^ Do), 

whereT>{e) = A~^{e)Y{e)A-\e), V(6>) = E(ViT(y, X; 6')[ViT(y, X; 6>)]0 
and Do = D(0o)- 

Because of the standardization, the rank correlation criterion function Qn 
is bounded by 1. It is not difficult to establish a uniform law of large numbers 

(11) lim sup \Qn{e) - Q{e)\ = 0, a.s., 

" 6>e0 

where Q{0) is the expectation of Qn{^)', cf. Han (1987) and Sherman (1993). 
Likewise, we can show that such uniform convergence also holds for Qn, i.e. 

(12) lim sup \Qn{e) - Q{e)\ = 0, a.s. 

" 6*60 

Note that the limit Q remains the same. 

In the following theorem, we claim that the estimate obtained from max- 
imizing the smoothed rank correlation function (4) is also asymptotically 
normal with the same asymptotic covariance matrix as Han's MRCE. 
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Theorem 1. For any given positive definite matrix S, let Qn{d) he 
defined as in (4) and On = argmaxgEzTn{0 + Z/^/n)Qn{0). Then, under 
Assumptions 1-3, On is consistent. On — )• o,-^- o.'^'d asymptotically normal, 

V^(On-Oo)^N{0,-Do), 

where Dq is defined as in Proposition 1. In addition. On is asymptotically 
equivalent to On in the sense that On = On + Op(n~^/^). 

Recall that (7) defines the sandwich- type variance estimator by pretend- 
ing that Qn is a standard smooth objective function. Theorem 2 below shows 
that (7) is consistent. 

Theorem 2. Let On be the MRC estimator and Dn he defined hy (7). 
Then, for any fixed positive definite matrix I], D„(0„, converges in prob- 
ahility to Dq, the limiting variance- covariance matrix of y/n{On — Oq). 

Remark 3. The self-induced smoothing uses the limiting covariance 
matrix Dq as In practice, we may initially choose the identity matrix for 
5], which is the same way as the initial step in algorithm I. By Theorem 

2.1, we know that the one-step estimator ^ in algorithm I converges in 
probability to the true covariance. However, this one-step estimator depends 
on the initial choice of Algorithm 1 is an iterative algorithm with the 
variance-covariance estimator converging to the fixed point of D„(0„,, S) = 
S. 



Convergence of Algorithm 1 is ensured by the following theorem. For 
notational simplicity, we let vech(B) be the vectorization of matrix B. For 
any function i; of S, 



d 



9S 



E 



d 



9S, 



dv dv dv dv , 



where 'Sr,s denotes the (r, s) entry of T,. 



- (fc) 

Theorem 3. Let S„ he defined as in Algorithm 1. Suppose that As- 
sumptions 1-3 hold. Then there exist S* , n > 1, such that for any e > 0, 
there exists N , such that for all n > N, 

P( lim Si'^ = E;, - Doll < e) > 1 - 6. 

fc— >oo 



SELF-INDUCED SMOOTHING FOR TRANSFORMATION MODELS 



9 



Remark 4. For a fixed n, I]* represents the fixed point matrix in the 
iterative algorithm. The above theorem shows that with probabihty ap- 
proaching 1, the iterative algorithm converges to a limit, as /c — )• oo, and the 
limit converges in probability to the limiting covariance matrix Dq. 

" (k) 

Remark 5. The speed of convergence of to S* is faster than any 

exponential rate in the sense that — || = o{r] ) for any i] > 0. This 

can be seen from Step 2 of Algorithm 1 in Subsection 2.1 and (13) below, 



(13) sup 

|je-9o||=o(l),SeA/'{Do) 



d 



5S 



which will be proved in the Appendix. Here M(Do) is a small neighborhood 
of Dq and 5] is a positive definite matrix. 

2.3. Proofs. In this section, we provide proofs for (1) asymptotic equiva- 
lence of SMRCE to MRCE, (2) consistency of the induced variance estima- 
tor and (3) convergence of Algorithm 1. Some of the technical developments 
used in the proofs will be given in the Appendix. 

Proof of Theorem 1. Without loss of generality, we assume Oq = 0. 
As in Subsection 2.1, let Z be a d-variate normal random vector with mean 
and covariance matrix Define 

Qn{9)=EzQn{0 + Z/V^). 

Let r„(6>) = Qn{e) - Q„(6>o) and f„(6>) = EzT^iO + Z/^) = Q„(6>) - 
Qn{do)- Define 

On = argmax^ Qn{0) = argmax0r.„(0). 

Let Qn = /[||Z||2 > 2dlogn], where ||Z||2 = Vz/Z. Then P(n„) = o{n-^) 
due to the Gaussian tail of Z. Since \Qn{d)\ < 1 and |r„(0)| < 2, 

\Ez{rn{e + z/V^)i[n„]}| < p(n„) = o(n-2). 

By the Cauchy-Schwarz inequality, 

Ez{\Z\I[ftn]} = o(n-2) Ez{\Z\^mn]} = o(n-2). 
By (9), uniformly over o(l) neighborhoods of 0, 

Ez{Tnie + z/v^)i[n^^]} = {i/2)Ez{{e + z/^)'Ao{9 + z/^)i[ni]} 

+ {i/^)Ez{{e + z/y/Ti)'Wnmi]} + op{Ez{\e + z/v^|2/[n^j} + 
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Note that 

Ez{\e + < 2iEz\e\^ + i?z|z|Vn) = oi\e\^ + i/n). 

Therefore, uniformly over o(l) neighborhoods of 0, we have 

(14) r„(6') = {1/2)9' Ao6 + {l/^)e'Wn + E{Z'AoZ)/2n + Op{\e\'^ + l/ 
Replacing 6 in (14) with = and subtracting it from r„(0), we have 

(15) r„(0) - r„(0o) = lo'AoO + -^e'Wn + op{\e\^ + i/n). 

2 vn 



Combining (15) with Lemma 1 in the Appendix, we get, 

(16) V^(^„-6>o) = AoiW„ + Op(l). 
Therefore, from (10) and (16), we have 

VniOn - On) = Op(l). 

Finally, strong consistency of On follows the uniform almost sure convergence 
of Q n as stated in (12). This completes the proof. Q 

Proof of Theorem 2. For notational simplicity, we assume through- 
out the proof that 5] is the identity matrix. The same argument with mod- 
ifications to include constants for up and lower bound may be applied to 
deal with a general covariance matrix S. 

We first show 

(17) An{On) A{Oo). 

By definition, [A„(0)]^,, = d'^QniO) / {dOrdOs). As defined in (4),Q„(0) has 
the following integral representation, 

Qn{0)= / Q„(6> + z/V^)(27r)"2exp(-^)dz. 

By change of variable t = + z/^/n, 

(18) QniO) = J Qnit)Knit,0)dt, 
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where Kn{t,6) = (27r) 2722 exp( — - — —). From (18), 

_d_ 



d f 

-Qn{e) = J Qn{t)KnA^,^)dt 



and 



-Qn{0) = j Qn{t)Kn,rA^,0)dt, 



where Kn,r(t,0) = dKn{t,e)/der and Kn,rA^,6) = d^Kn{t,e)/{d9rdes 
In view of (6), to show (17), it suffices to prove 

(19) j Q„(t)K„,,,,(t, e)dt = [A(6'o)]r.,s + Op(l) 

uniformly over \\e - OoW = 0{n'^/'^). To show (19), we define 



^n,r — it : {tr — 6r)^ < 



n ^-^ n I 



By Lemma 2(i) and the boundedness of Qn{t), we have, 

/ Qnit)Kn,rA^,0)dt = oin-^/^), 

where for set *B denotes its complement. Therefore, (19) reduces to 
(20) / Q„(t)K„,,,,(t, e)dt = [A(6>o)]r.,s + Op(l). 

71,3 

To show (20), we establish a quadratic expansion of Qni't) for t G fin.r H 
fln,s- Since ||t — 6\\2 < \/ 4d log n/n for t G ftn,r H fln,s and ||0 — ^olb = 
0(n-V2)^ follows that ||t - ^olb = o(l). Therefore, by (9), 

^21) Qni^) = QniOo) + ^(t - 0o)'A(0o)(t - 0o) 

+ (t - 0o)'W„/V^ + Op(|t - 0o|') + Op(l/n). 
Therefore, the left hand side of (20) equals I + II + III + IV, where 

1= / [Op(|t-6>o|^) + 0p(l/n)] i^„,,,,(t,6>)dt, 

II = Q„(6'o) X j Kn,rA^,0)dt, 

111= / (t-0o)/W,s(t,0)dt, 

IV = ^ / (t - 0o)'A(0o)(t - 0o)^n,r,s(t, 0)dt. 
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By the definition of iln.r, 



III < 



/ (logn)2 \ 



|iC„,,,,(t,6>)|dt. 



By Lemma 2(ii), I = Op(l). Furtliermore, II = o(n ^^^) due to Lemma 2(iii). 
Note that 



(22) +(6'-6'o) / Kn,rA-^,e)dt 



= (6>-6>o) / if„,,,s(t,0)dt, 

where the last equality follows from the fact that fln,r and fln,s are symmet- 
ric at and (t — 6)Kn^r,s{^, ^) is an odd function of [t — 6]r for r = 1, 2, ...,d. 
Combining this with Lemma 2(i), we have III = o(n~-'-). Again by symme- 
try, 

/ (t - 6>o)'A(6>o)(t - 6>o)A"n,r,s(t, e)dt 

(23) = [ {t-eyA{eo){t-e)kn,r,s{^,e)dt 

+ (0-0o)'A(0o)(0-0o) / Kn,rA^,e)dt. 

By Lemma 2 (i) and (iv), IV = [A(0o)]r,s + o{n~^^'^). Combining the ap- 
proximations for I — IV, we get (20). 

Next we prove V„(^„) V(0o) by showing, componentwise, 

(24) [V„(0)],,, = [V(0o)]r,. + Op(l) 

uniformly over \\e -eo\\ = 0{n-^/'^) for r,s = l, d. 
Define 

q{u,u;e) = -f[y>y]-^[(x_x)'/3>o] + hy<y]^[i^~iy(3<o]^ 
where u = (y, x) and u = (y, x). In addition, let 7:„(u, 6) = J q{u, u; d)¥n{du), 
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where F„(-) is the empirical distribution for Uj's. By definition, 



i=l 



" 2 e 2 (iz 



d_ 

dfs 



, „ Z . , ._d _ INb , 
Tn{ui,e + ^)(27r) 2e 2 dz 
n 



Letting t = + z/-^/ri and u = 9 + z/-^/n, we have 

i=l ^ ' 



Gn{t,U})kn,r{t,e)kn,s{'^,G)dtdu, 



where Gn(t,a;) = ^ r„,(uj, t)T„,(uj, cj), which is bounded by and 1. 
By Lemma 2 (vii), 



(25) 



[Yn{e)]r,s=o{n-2) 



uniformly over ||0 — ^qII = 0{n 2). Let /(u, v, vi^; 0i, ^2) = ^(u, v;0i) x 
(^(u, w;02) and /*(u, v, w; 0i, 02), the symmetrized /. By definition, 



Gn(01,02) = 7^ r(Ui,U„Ufc;0i,02) 



(26) 



i<j<k 



+ ^ J^r(u„Uj,Uj;6li,6l2) = [/„ + ^[/„ 



Clearly [/„ is a third-order U-statistics and C/„ is a second-order U-statistics. 
Applying Hoeffding's decomposition (van der Vaart, 1998, section 12.3), 



(27) 



c=0 ^ ^ 



where Un,c is a U-statistics of order c (c = 0, 1, 2, 3) and defined as 
Un,c=-73r Yl -7!v:YPB[f*iui^,Ui^,Ui^; 01,02)]. 

IJ \B\=c i 
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Here, adopting the notations from van der Vaart (1998, Section 11.4), we 
define Pb [/*(uj^ , Uj^ , Ujg; 0i, ^2)] as a projection of /* such that 

= Ef*, 
P{i)r = E[r\Mi]-Ef\ 

P{,jyr = E[nni,uj] - E[r\ui] - E[r\u,] + e/*, 
P{i,2,3}r = E[r\ui,u2,u3]-Y,E[r\u„u,]+ ^ E[r\ui]-Er. 

i^j i=l,2,3 

We know from Hoeffding's decomposition that f/„^2 and Un,3 are second- 
and third-order degenerated U-statistics with bounded kernels and thus of 
order Op{n~^) and Op(n~^/^); see Sherman (1992, Corohary 8). Therefore, 
by Lemma 2(vi), 

(28) / C/„,,ii:„,^(t,6>)i^„,,(cj,6>)dtdw = Op(l), for c = 2,3. 

Replacing Un,c by Un/n in (28) also results in Op(l). Then combining this 
and (28) with (26) and (27), (25) reduces to 



(29) 



[v„(6>)],,, = 3 X / Un,ix KnA^, e)knA^, e)dtdu 



+ I Efx A'„,,(t, e)kn,si^, e)dtdu + op(i). 



Let /i(uj;t,u;) = £;[/(u, v, w; t, ^)|u = u^], f2{wj;t,u) = £;[/(u, v, w; t, c^)|v 
Vj] and f-^{-Wj;t,u) = £'[/(u, v, w; t, a;)|w = Wj]. We define ^^^(t,^;) = 
1 " 

E/i(u,;t,a^). By the definitions of /(u, v, w; t, a^) and q(u,v,6), we 



n . 



1 

have Gn(t,u) = — > T(uj, t)T(uj, a;). By Lemma 3 and applying integra- 
n ^-^ 

i=l 

tion by parts twice, 



fi^ aT(u.:.e + ^)aT(u.,e + ^) ) 



where fJ„^r := {z : z,^ < 4 log n, ^^^^ < 2((i — 1) log n}. By Lemma 3, 
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i=l ^ * 

uniformly over {{e\,e%) : \\e* - ^olb = o(l),i = 1,2}. Therefore, 
1 " 

- V/i(uj;t,a;)xi^:„.,(t,6l)K„,,(c^,6l)dtd^ = [V(6>o)]r,.+0p(l). 



Similarly, applying integration by parts and by Lemma 3 and 2(vi), we have 
1 " 

-5^/2(vj;t,a;)xK„,,(t,6>)A'„,,(cj,6>)dtdu; = [V(6'o)]r,s+Op(l), 



1 " 

-'E/3(wj;t,a;)xK„,,(t,0)i^„,,(cj,6>)dt(ia; = [V(6>o)]r.,s+Op(l), 



^[/(u,v,w;t,^)]xJ^„,,(t,6>)ii:„,,(u;,0)dtdu; = [V(0o)]r,s+Op(l). 



Hence the right hand side of (29) is [V(0o)]r,s + Op(l); which gives (24). 
Prom (17) and (24), D„(0n) ^ Dq. □ 



Dr 



Proof of Theorem 3. Prom Theorem (2), we know that —, 
and 5]„(0„,Do) Dq. By the mean value theorem, 

\0n "* ~ Do]r,s = [^n(^nj ^) — ^^^(^n, Do)]r,s + [5^n(^n, Dq) — Do]r,s 

X t>ec/i(si^^ - Do) + [S„(^„, Do) - D, 







/ 


_9S ^ "^"'^ 


S=S*- 


X 



Ojr,; 



where - Do|| < ||S„ - Do|| and thus G AA(Do). In view of Lemma 
4 and S„(0„,Do) — > Do, 5]„ — )• Dq. Again by the mean value theorem, 



.^{k+D ^(1), 

i^n ~ \r,s 



" ^ r-n 1 




/ 




S=S*. 


X 



where — Do|| < — Do||. Then by Lemma 4 and mathematical 

induction, we know that for any e > and r] > 0, there exist K and N, such 
that for any n > N and k > K, 



(fc) _ ^{k-i) 

n n 



]r.,s|, for all /c > i^l > 1 - e 
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where 1 < s,r < d. Note that the inequahty inside the above probabihty 
imphes that S„ converges as /c — >• oo and the hmit S* satisfies = 

D„(0„,s;) and s; 



Dn 



□ 



3. Extensions. In this section, we extend the approach to the partial 
rank correlation (PRC) criterion function Q* , defined by (3), of Khan and 
Tamer (2007) for censored data. Under the usual conditional independence 
between failure and censoring times given covariates and additional regular- 
ity conditions, Khan and Tamer (2007) developed asymptotic properties for 
PRCE that are parallel to those by Sherman (1993). 

The same self-induced smoothing can be applied to partial rank correla- 
tion criteria function to get 



QUe) = EzQUo + z/V^) 

(29) = Yl ^'AY^ > ^.l^ (^/^X^/3(0)/^.,) . 

We define its maximizer, 6^, as the smoothed partial rank correlation esti- 
mator (SPRCE). Let 



(30) Aue,^) 



2n(n - 1) ^ I ^ ^ 



(Ji 



CJ, 



(31) v:(0,^) = ^E E 



i=l I j 



Hij X 



(1) 



(32) d;(0, s) = [Aue, X v:(0, e) x [a;(0, 

where Hij = Aj x I[Yi > Yj] - A, x I[Yj > Yi\. 

Based on D^(0, 5^), we have the following iterative algorithm to compute 
the SPRCE and variance estimate simultaneously. 



Algorithm 2. (SPRCE) 



1 . Compute the PRC estimator 6^ and set S to be the identity matrix. 
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2. Update variance-covariance matrix Xl„ = D*(0„,5],„ ). Smooth 

- *{k) 

the partial rank correlation Q^i^) using covariance matrix S„ . Max- 
imize the resulting smoothed partial rank correlation to get an esti- 
mator y„ . 

3. Repeat step 2 until ^ converge. 

In addition to Assumptions 1-3, Khan and Tamer (2007) added the fol- 
lowing assumption for the consistency of PRCE. 

Assumption 4. Let Sx be the support of Xj, and Xuc be the set 
Xuc = {x G Sx : P(Ai = 1|X, = x) > 0}. 
Then P{Xuc) > 0. 



Similar to the rank correlation function, it can be shown that under As- 
sumptions 1-4, (9) and (11) still hold for partial rank correlation function 
Q'^{0). Therefore, Theorems 1-3 in Section 2 continue to hold when replac- 
ing the point and variance estimators for smoothed rank correlation by the 
corresponding ones for the smoothed partial rank correlation. Specifically, 
for any positive definite matrix under Assumptions 1-4, we have 

1. The SPRCE 6^ is asymptotically equivalent to the PRCE in the 
sense that 0^ = 0^ + Op{n~^/'^), and, therefore, 

V^(0* -0o) AiV(0,D^), 

where Dq is the limiting variance-covariance matrix of 6^. 

2. Variance estimator is consistent: D*(0^,5]) — ^ Dq. 

3. Algorithm 2 converges numerically in the sense that there exist 5]*, 
n > 1, such that for any e > 0, there exists A^, such that for all n > N, 

P(limfc_oo tf'' = ^*n, lis; - D5II < e) > 1 - e. 

The proofs are similar to those of Theorems 1-3 in Section 2, and are, 
therefore, omitted. 

4. Numerical results. In this section, we first apply the proposed self- 
induced smoothing method to analyze the primary biliary cirrhosis (PBC) 
data (Fleming and Harrington, 1990, Appendix D) and compare the result 
with that using the Cox regression. We then report results from several 
simulation studies we conducted using the method. 
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4.1. PBC data. We applied smoothed PRCE to the survival times of the 
first 312 subjects with no missing covariates in the PBC data. We included 
two covariates albumin and age50 (age divided by 50). We reparameterized 
the transformation model (1) by setting /3age50 as 1, and estimated 6 albumin 
by SPRCE. We also calculated PRCE for Oaibumin and fitted the standard 
Cox model. For the Cox regression, the ratio 13 albumin / Pagebo is the estimate 
of 

(^albumin- The results are summarized in Table 1. 

Table 1 

Regression Analysis of PBC data 
Albumin SE 
SPRCE ^429 1.40 
PRCE -3.50 
Cox -3.04 0.60 

Note that PRCE does not have a readily available standard error estimate. 
The standard error of PaWumin/ Page50 in the Cox model was estimated by the 
delta method. Estimates from both the SPRCE and the Cox model conclude 
that the ratio of P albumin to fiagebo is significant. 



-0.135 



-0.14 



g -0.145 



= -0.155 



-0.16 



-0.165 




35 I , , 1 1 , , 1 , 1 

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 

Albumin 



Fig 1. 
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To further assess the self-induced smoothing procedure, we plot the origi- 
nal objective function as well as the smoothed one in the first and last steps 
of our algorithm, as shown in Figure 1. The top curve is the original objec- 
tive function, the middle curve is one after the initial smoothing, and the 
bottom curve is the limit of the iterative algorithm (after 8 iterations). It 
appears that the one-step smoothed objective function is under-smoothed 
in terms of the level of fluctuations, and the limiting curve is quite smooth. 



Table 2 

The proportional hazard model without censoring 



n = 


■ 500 


Est 


Mean 


Bias 


RMSE 


SE 


coverage 




e 


SMRCE 
MRCE 
Cox 


1.601 
1.601 
1.599 


1.2 X 10"^ 
0.8 X 10~3 
-0.9 X IQ-^ 


0.0298 
0.0340 
0.0200 


0.0316 


92.3% 


n = 


-- 1000 


Est 


Mean 


Bias 


RMSE 


SE 


coverage 




e 


SMRCE 
MRCE 
Cox 


1.601 
1.600 
1.600 


1.0 X 10"^ 
0.2 X 10^3 
0.1 X 10-3 


0.0193 
0.0225 
0.0141 


0.0212 


93.9% 


n = 


2000 


Est 


Mean 


Bias 


RMSE 


SE 


coverage 




9 


SMRCE 
MRCE 
Cox 


1.600 
1.600 
1.600 


0.2 X 10-3 
-0.1 X 10-3 
0.1 X 10^3 


0.0136 
0.0158 
0.0100 


0.0144 


94.9% 






Table 3 

The proportional hazard model with censoring 




n : 


= 600 


Est 


Mean 


Bias 


RMSE 


SE 


coverage 




9 


SPRCE 
PRCE 
Cox 


1.604 
1.603 
1.601 


3.7 X 10-3 
2.9 X 10-3 
1.0 X 10-3 


0.0282 
0.0327 
0.0204 


0.0300 


93.2% 


n = 


1200 


Est 


Mean 


Bias 


RMSE 


SE 


coverage 




9 


SPRCE 
PRCE 
Cox 


1.601 
1.601 
1.600 


1.1 X 10-3 
0.8 X 10-3 
-0.2 X 10-3 


0.0190 
0.0217 
0.0139 


0.0201 


93.9% 


n = 


2400 


Est 


Mean 


Bias 


RMSE 


SE 


coverage 




9 


SPRCE 
PRCE 
Cox 


1.600 
1.600 
1.600 


0.4 X 10-3 
0.1 X 10-3 
-0.2 X 10-3 


0.0127 
0.0148 
0.0097 


0.0136 


95.4% 



20 J. ZHANG, Z. JIN, Y. SHAO AND Z. YING 

Table 4 

The linear model with gaussian noise 



n = 250 


Est 


Mean 


Bias 




RMSE 


SE 


coverage 


01 


SMRCE 


L615 


1.5 X 10" 


-2 


0.0747 


0.0756 


91.7% 




MRCE 


1.612 


1.2 X 10" 


-2 


0.0730 




_ 




LS 


1.601 


0.7 X 10" 


-3 


0.0296 


- 


- 




SMRCE 


.5042 


0.4 X 10" 


-2 


0.0427 


0.0443 


93.6% 




MxlUil; 


KCiKQ 


U.b X iU 


-2 


U.U4ZO 








T G 


.OUUb 


U.D X iU 


3 


U.Uo04 






n = 500 


Est 


Mean 


Bias 




RMSE 


SE 


coverage 


01 


SMRCE 


1.605 


4.9 X 10-^ 


3 


0.0515 


0.0513 


92.7% 




MRCE 


1.607 


6.7 X 10^^ 


-3 


0.0523 


_ 


_ 




LS 


1.601 


0.7 X 10" 


3 


0.021 






02 


SMRCE 


.5023 


2.3 X 10" 


-3 


0.0296 


0.0302 


94.6% 




MRCE 


.5042 


4.2 X 10" 


-3 


0.0316 








LS 


.5006 


0.6 X 10" 


-3 


0.0254 






n = 1000 


Est 


Mean 


Bias 




RMSE 


SE 


coverage 


01 


SMRCE 


1.603 


3.6 X 10 


-3 


0.0361 


0.0348 


92.4% 




MRCE 


1.603 


3.4 X 10 


-3 


0.0382 








LS 


1.601 


0.5 X 10- 


-3 


0.0144 






02 


SMRCE 


.5009 


0.9 X 10 


-3 


0.0203 


0.0207 


94.8% 




MRCE 


.5018 


1.8 X 10- 


-3 


0.0214 








LS 


.5004 


0.4 X 10' 


-3 


0.0176 







4.2. Simulation studies. We conducted simulation studies for a number 
of cases. In the first case (Design I), we generated X from a bivariate nor- 
mal distribution with mean [—10, 20]' and a covariance matrix diag{3^, 2^}. 
We set Pq = {0, 1) = [1.6, 1] and generated e from the probability density 
function f{w) = 2exp{2w — exp(2w)). We set the transformation H(x) as 
H~^{y) = logijp). This is indeed a Weibull proportional hazard model. The 
sample sizes were n = 500, 1000, 2000 and the numbers of replications were 
500. The SMRCE, MRCE and Cox model were used to estimate 0, and the 
standard error of SMRCE was computed by Algorithm 1. The mean(Mean), 
bias(Bias) and root mean square error(RMSE) for each method as well as 
mean of standard error(SE) and coverage of 95% confidence interval for the 
SMRCE are reported in Table 2. 

The second case (Design II) is similar to the first one except that Y is 
censored by a random variable C, which is independent of X and normally 
distributed with mean fi = 9.2 and variance a = 0.52. The sample sizes were 
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n = 600, 1200, 2400 and the numbers of replications were 500. This design 
is similar to that in G0rgens and Horowitz (1999). The SPRCE, PRCE and 
Cox model were used to estimate 6, and the standard error of SPRCE was 
computed by Algorithm 2. The resulting estimates are summarized in Table 
3 where we also report bias(Bias), root mean square error (RMSE), mean of 
standard error (SE), and coverage of 95% confidence interval. 

In the third case (Design III), we generated X = [Xi, X2, X^]' by two 
steps. We first generated [Xi, X3]' from a bivariate normal distribution with 
mean [—2,2]' and an identity covariance matrix. We then generated X^ as 
or 2 with equal probability. We set (3^ = {61,62,1) = [1.6,0.5,1] and 
generated e from a normal distribution with fj. = and cj^ = 0.5^. We set 
the transformation H(x) = x. The sample sizes were n = 250, 500, 1000 
and the numbers of replications were 500. The SMRCE, MRCE and least 
squared method were applied to estimate 61 and 62, and the standard error 
of SMRCE was computed by Algorithm 1. Table 4 reports the mean (Mean), 
bias (Bias) and root mean square error (RMSE) for each method as well as 
mean of standard error (SE) and coverage of 95% confidence interval for the 
SMRCE. 

From Tables 2, 3 and 4, we find that (1) the root mean squared error 
is close to the mean standard error for the SMRCE (SPRCE); (2) as the 
sample size increases, the bias reduces and the coverage of 95% confidence 
interval converges to the nominal level. These show that the proposed vari- 
ance estimator is accurate and Algorithms 1 and 2 work well. 

5. Discussion. This paper provides a simple yet general recipe for 
smoothing the discontinuous rank correlation criteria function. The smooth- 
ing is self-induced in the sense that the implied bandwidth is essentially the 
asymptotic standard deviation of the regression parameter estimator. It is 
shown that such smoothing does not introduce any significant bias in that 
the resulting estimator is asymptotically equivalent to the original maximum 
rank correlation estimator, which is asymptotically normal. The smoothed 
rank correlation can be used as if it were a regular smooth criterion function 
in the usual M-estimation problem, in the sense that the standard sandwich- 
type plug-in variance-covariance estimator is consistent. Simulation and real 
data analysis provide additional evidence that the proposed method gives 
the right amount of smoothing. 

Because of the family of transformation models contains both the pro- 
portional hazards and accelerated failure time models as its submodels, the 
new approach may be used for model selection. The specification test com- 
monly used in the econometrics literature (Hausman, 1978) may also be 
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used for testing a specific semiparametric model assumption. In addition, 
the smoothed objective function also makes it possible to fit a penalized 
regression by introducing a LASSO-type penalty. 

The method and theory developed herein can easily be extended to other 
problems of similar nature, i.e. discontinous objective functions with asso- 
ciated estimators being asymptotically normal. In particular, we can apply 
the self-induced smoothing to the estimator introduced by Chen (2002) for 
the transformation function H to obtain a consistent variance estimator. 

APPENDIX A: LEMMAS, COROLLARIES AND PROOFS 
Lemma 1 below is due to Sherman (1993, Theorem 2). 

Lemma 1. We denote r„(0) as general objective functions which are 
centered and satisfies the same regularity conditions as in Sherman (1993). 
Suppose 6 yi : — ar gmax Yi.iO^ is consistent for 0q, an interior point of®. 
Suppose also that uniformly over Op{l) neighborhoods of 6q, 

TniO) = 1(9- OoYAie -Go) + ^{9- 0o)'W„ + 0p{\9 - 0oP) + Op(l/n) 

where A is a negative definite matrix, and W.„ converges in distribution to 

a A^(0,V) random vector. Then 

^{0n - 6>o) = -A~'Wn + 0p{l) A N{0, VA-1). 

Recall in Theorem 2, we define Kn{t,6) = (27r)~2n2 exp(— "^^^ ^^^^^ ) 
its first and second partial derivatives with respect to 9 as 

Kn,r{t,9) = ( — )~2n(t^ - er)e 2 , 
n 

It: d r, "Ilt-s|i2 
Knrr{t,9) = ( — )~2n{n{tr " Or) " l)e 2 — , 

' ' n 

27r d o n||t-S|l2 

i^n,r-,.(t,6') = { — )~-2n^{tr-6r){ts-es)e —. 

n 

Also recall 

^n,r = {t : {tr " 0rf < 4 log n/n,^(ti - Oif < 2{d-l)logn/n}. 

Then we have the following lemma. 

Lemma 2. Uniformly over ||0 — ^olb = 0(n~2)^ 
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(i) [ F{t)Kr,rs{t,0)dt = o{^),yF{t) S.t.O<F{t)<l. 

(it) f -\kn,rA^,S)\dt = 0{l). 



(in) / Kn,r,s{^,0)dt = oin-^^"^). 



(iv) I ^(t - 6>)'A(t - G)kn,r,s{^,e)dt = [A]r,s + o{n-^). 

(v) [ |ii:„,,(t,0)|dt = O(n-3/2). 

(Vi) [ -^\kn,r{i,e)\dt = 0{l). 

(vii) For any given < G{t,u) < 1, 

J G{t,u)kn,r{t,e)kn,s{^,0)dtdu 

= / G{t,u)kn,r{t,0)kn,s{i^,e)dtdu; + o{n~i). 

Proof. Let dn,r = |t : < 41og?i/?i,^._^^tf < 2{d - l)logn/n|, and 
divide its complement into ^n}- '■— |t • *r > 41ogn/n| and f2^^|. := 

|t : < 41ogn/n, > 2((i — l)logn/n|. We prove (i)-(iv) for s = r. 

For s ^ r, the proofs are similar and omitted. 

For (i), note that 

/ F{t)kn,rA^,e)dt= [ F{t + e)kn,rA^,o)dt. 

Since < F(t) < 1 and (nt^ - l)I[n^^l] > 0, 

/ F{t + e)Knrr{t,0)dt = / F(t + 6')(— )-2n(nt2- l)e —dt 

Jn J ft n 



<(i:i)-f / nfnL^ - l)e ^(it 



=(27r)^2 /_ n^d{ntre~ — ) d^iV^U) = 

Jfi V- V"- 
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where the last equahty follows from ^ — J JJ '^'^'(v^^i) ^ 1- Similarly, 



F{t + e){—)-2{nX - n)e —(it 



r 2 
<(27r)"^ / n|nt^ - l\e~^ d^/ntrWd^{^/nti) < 8 



nilognf/'^ 1 

7. = O 



n 



For (ii), by definition, 
-j \Kn,r,r{t,e)\dt = {l-Ky^^ [ y/n\ntf. - l\e~^ dtrYld^{^/7iti 



< 



/ log n 



+ 



ntre 2 





tr = l/v^ 

< 1. 



0(1), 



where the inequality follows from < 

/ ( — r^(nHt-n)e ~dt 

'fin,, n 



For (iii), by definition, / Kn,r,r{i',0)dt 



2n2tre 2 



V n 



d^{y/nti) = o{-j=), where the last equality 
follows from < ^ ]Jd^>(Vrati) < 1. 

For (iv), by definition and applying integration by parts twice, 

/ ^{t-eyA{t-e)Kr,,rA^,e)dt 

= f ^t'At{2Try^ ^/nd{-ntre~^)Y[d^{^/nti) 

=o(n-^)+ / t'Aer(27r)-5^d(-e"^) JJd$(Vr^ti) 
=Ar,r- + o(n"^). 



where er' = (0, 0, 1, 0, 0) with r^^ entry being 1, and the last equality 
follows from the Gaussian tail probability. 
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For (v), we know, by definition, / \Knr{'t,(^)\dt = / „ \Knr{'t,0)\dt. 

Jvf ' JCl 

By symmetry. 



r ■ 2 f 1 ntp 

/ \KnA^,0)\dt = -= / n--die-—)lld^V^ti) 



<^xl = 0(n-i). 



where the inequahty foUows from ^ — J JJ^*^(v^^i) ^ 1- Similarly, 

[ \K^,rAm\dt = ^ / - d{e-"4)YldHV^U) < 4 = 0(n- 

For (vi), by definition, 

^ / \Knr{t,e)\dt = / (27r)"2n~n|t^|e 2-cit 

=(27r)-5 2 / _ (i(e-^)J]d$(V^t,) = 0(1), 

where the second equality is due to symmetry and the third equality follows 
from < y JJc^^'(\/nti) < 1. 

To prove (vii), without loss of generality, we assume < G{t,uj) < 1. We 
denote fl^ as fln,r and ^n,r its complement. Then, 

< f \kn,rC^,^)\dtX f \kn,s{^,e)\du 
= \kn,ri^,0)\dt X \kn,si^,0)\du; 

where k and / are chosen from {a,b}. Then by (v) and (vi), 

G(t, u)kn,rCt, e)kn,s{^, d)dtdu 

= I G{t,u)knA-t,e)kn,s{^^d)dtdu + o{n-^). 



□ 
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Lemma 3. Uniformly over (t,a;) such that ||t — ^oll = o(l) and \\u: 
00 II = we have 



1 



i=l 



dT{Ui,t) 



5r(u, 00 



-r(u,0o) 



+ Op(l), 



i=l 



9r(uj,t) (9r(uj,u;) 



E 



dT{u,eo)dT{u,eo) 



+ Op{l). 



Proof. We sketch the main steps of the proof below. 
First of all, observe that 



1 " 
n ^ 



d6r 



T(Ui,^) 



< 



1 " 

-E 

n ^ 

i=\ 

1 " 

-E 



9r(ui,0o 



-i"(ui,^) 



1=1 

n 



5r(ui,t) 5r(ui,0o) 



< - VM2(ui) X |t-6>o|, 



X T 



i=l 



where M2(u) is an integrable function. The last inequality is due to Assump- 
tion 3 and |t(u, 0)| < 1. 

Since M2(u) is integrable, by the law of large numbers, the left hand side 
of above inequality is thus Op(l). By a similar argument, we can show that 



1 



E 

i=l 



dT{Ui,t) 



n 



E 

i=l 



gr(uj,0o) 



T{ui,9o) 



+ Op(l). 



By the law of large numbers, we get (A.l). The proof of (A. 2) is similar. □ 

Lemma 4. Let A„ and be the same as those in (7). Then, for 1 < 
r,s < d, we have 



sup 

||0-0ui|=o(l),S6A^(Do) 



sup 

||e-euj|=o(l),EGA^(Do) 



d 



d 



[A„(6»,5])],,, = Op(l), 



where M(Do) is a small neighborhood of Do and Yl is a positive definite 
matrix. 
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Proof. We now extend the definition of kernels in Lemma 2 for any 
covariance matrix S as follows, 

ir„(t, e, s) := (^)-l |s|-i/2 exp(-5(t - ey^~Ht - e)), 

n I 

where is the determinant of Xl. Then the first and second derivatives of 
Kn with respect to become 

^n,^,s(t,0,S) := (— )-i|S|-5ner' \aTr^(t - 0)(t - 0)'^.'^ - S"^] 



n 



We partition into ^n,r and its complement fJ^,,, where firi,r := |t : 
(t - 0)'S"^(t - 0) < edlogn/nj. Furthermore, We define h^^r ■= |t : 
t'S-^t < 6dlogn/n|. 

Note that (t - 0)(t - 0)'e-3(t-^)'^"'(t-^) is bounded for S G AA(L>o)- 
Similar to the proofs of Theorem 2 and Lemma 2, we can get 

uniformly over (0, such that \\6 — Oq\\ = o(l) and — Do|| = o(l). 
Likewise, we have / ^ ' dt = o(l), which, combined with 

— "'^„ ' ^' — ^(it = 0, implies / — ^ — ^dt = o(l). This completes 

the proof. □ 

Corollary 1. For I < r, s < d, we have 

d 



sup 

-9o||=o(l),S6A/'(Do) 



9S 



[An{e,-E)-\s = Opil). 
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Proof. First, by Theorem 2, Lemma 4 and the mean value theorem, 
we can show that [A„(0, Y;)]r,s = [An{0, S) - A„(0, Dq) + A„(0, Do)]r-,s = 
[A(0o)]r,s + Op(l)- By matrix differentiation, dA~^ = —A''^{dA)A~^. Thus 
A-i - Aq ^ = -Aq ^(A„ - Ao)Ao ^ + o(||A„ - Ao||i), where Aq = A(0o)- 
The rest of the proof is straightforward and thus omitted. □ 

Lemma 5. For 1 < r, s < d, we have 

[D„(6>,5])],,, = Op(l). 

Proof. The result follows immediate from Lemma 4 and Corollary 1. □ 

APPENDIX B: A SUFFICIENT CONDITION FOR ASSUMPTION 3 

Suppose / is the joint density for (X, y) and f{-\r,s) is the conditional 
density of X^"^^ given X^^^ = r and Y = s. Suppose g{-\s, 6) is the conditional 
density of X'/3(0) given Y = s and g{-\r,s,6) is the conditional density of 
"K! (3{0) given X^"^^ = r and Y = s. By change of variable, g{t\T,s,6) = 
/(t — r'0|r, s). Therefore, 

g{t\s,e) = /5(t|r,s,0)Gx(i)|,(fZr) = / /(t - r'0|r, s)Gx{i)|,((ir), 

where Gx(i)|s is the conditional distribution of X^^^ given Y = s. We also 
observe that, 

t(z,0) = r-T n^g{t\s,e)GY{ds)dt + f;;^g{t\s,e)GY{ds)dt, 

where Gy is the marginal distribution of Y. Therefore if the conditional 
density /x{2)|x(i) ri'l^^ ^) bounded derivatives up to order three for each 
(r, s) in the support of space X*^^) Cg) Y, it is not difficult to show that As- 
sumption 3 is satisfied. The sufficient condition can be easily verified in 
certain common situations such as when the conditional density /x{2)|x(i) Y 
is normal. 
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